When are correlations strong? 
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The inverse problem of statistical mechanics involves finding the minimal Hamiltonian that is 
consistent with some observed set of correlation functions. This problem has received renewed in- 
terest in the analysis of biological networks; in particular, several such networks have been described 
successfully by maximum entropy models consistent with pairwise correlations. These correlations 
are usually weak in an absolute sense (e.g., correlation coefficients ~ 0.1 or less), and this is some- 
times taken as evidence against the existence of interesting collective behavior in the network. If 
correlations are weak, it should be possible to capture their effects in perturbation theory, so we 
develop an expansion for the entropy of Ising systems in powers of the correlations, carrying this out 
to fourth order. We then consider recent work on networks of neurons [Schneidman et al., Nature 
440, 1007 (2006); Tkacik et al., |arXiv:0912.5409| [q-bio.NC] (2009)], and show that even though 
all pairwise correlations are weak, the fact that these correlations are widespread means that their 
impact on the network as a whole is not captured in the leading orders of perturbation theory. 
More positively, this means that recent successes of maximum entropy approaches are not simply 
the result of correlations being weak. 



I. INTRODUCTION 

Most of what is interesting about the phenomena of 
life results from interaction among large networks of 
elements — protein structures are stabilized by networks 
of interactions among amino acids, metabolism is gov- 
erned by a network of enzymatic reactions, decisions 
about cell fate during embryonic development are deter- 
mined by a network of genetic regulatory interactions, 
and our perceptions are shaped by dynamic interactions 
among networks of neurons. Physicists have long hoped 
that the behavior of such large networks could be ap- 
proached using ideas from statistical mechanics, and this 
idea has been explored most fully in the context of neu- 
ral networks [TJ |2J. In contrast to the usual statisti- 
cal mechanics problems, however, it is not clear how to 
measure the macroscopic "thermodynamic" properties of 
these networks. On the other hand, a new generation of 
experiments is making it possible to observe something 
closer to the microscopic state of these networks, for ex- 
ample recording the activity of large numbers of neurons 
simultaneously [3H5]. Given such data, what can we say 
about the global structure of the network? 

Although experiments are continually improving, they 
will never get to the point that they can sample fully the 
state space of even modest sized networks. What such 
data can provide, with high precision, is data on a finite 
set of correlation functions or expectation values, or the 
distributions of some small set of order parameters. To 
make progress toward a global description of the network, 
we need to solve an inverse problem. In the language of 
statistical mechanics, we are given the expectation values 
of various operators, and we need to infer the underlying 
Hamiltonian. In general, of course, this is ill-posed. Re- 



cently, a number of groups have explored the possibility 
that this inverse problem can be successfully regularized 
using the classical idea of maximum entropy [6] . 

To make these ideas concrete, note that the electrical 
activity in networks of neurons consists of discrete, iden- 
tical pulses termed action potentials or spikes [7HS] ■ In a 
small window of time, each neuron either generates one 
spike or remains silent, so that the state of the system is 
described naturally by Ising spins, spin up for a spike and 
spin down for silence. Knowing the mean rate at which 
each cell generates spikes is equivalent to knowing the 
mean magnetization of each spin, and the probability of 
two cells generating spikes in the same small window of 
time is related to the spin-spin correlation function. The 
maximum entropy model consistent with knowledge of 
the mean spike rates and pairwise correlations is then an 
Ising model with pairwise interactions [THl HJ , and the 
relevant inverse problem is to determine the magnetic 
fields and spin-spin interactions from measurements of 
the magnetization and two-point correlation functions. 
In general these systems are inhomogeneous and likely 
to be glassy, since neurons can be both positively and 
negatively correlated. 

Interest in the inverse statistical mechanics approach 
to biological networks has been raised by several demon- 
strations that maximum entropy models built from pair- 
wise correlations succeed in capturing the higher-order 
structure of these systems. Examples include neurons 
in the retina [TT1H4"] , in cultured networks [TT1 ITS] , and 
in the cortex |16[ 117) . An independent stream of work 
has shown that functional proteins can be constructed 
by drawing randomly from an ensemble of amino acid se- 
quences that reproduces the correlations between substi- 
tutions at pairs of sites across known families of proteins 
[TE1421) . This construction is, in certain limits, equivalent 
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to a maximum entropy model |22] , and this approach has 
now been applied to wide range of different proteins (231 — 
125] . Pairwise maximum entropy models have also been 
used to describe biochemical and genetic [37J net- 
works, and even the spelling rules for four letter words 

[2H]. 

The full power of statistical mechanics approaches lies 
in the limit of large networks. Generations of theoreti- 
cal studies have led us to hope for interesting collective 
behavior in these systems, which of course becomes clear 
only in the large N limit. The maximum entropy con- 
struction provides a bridge between these theoretical ex- 
pectations and real data [TTJH2]- The search for collective 
effects in large networks is also a subject of controversy, 
and settling these controversies will require actually solv- 
ing the inverse problem in large systems. Recent work 
suggests some promising approaches, but also highlights 
the difficulties of the problem [29 - 31] . 

Here we try to make progress on something more mod- 
est than the full inverse problem. We start with the ob- 
servation that individual pairwise correlations often are 
weak in an absolute sense; for example, the correlation 
coefficient between the activity of two neurons typically 
is C ~ 0.1 or less. This apparently weak correlation sug- 
gests that the effects of correlations will be small, and 
indeed if one looks at small groups of neurons this must 
be true. More formally, if we believe that correlations 
are weak, then it should be possible to capture the im- 
pact of these correlations in perturbation theory. Here we 
develop this perturbation theory, evaluating the entropy 
of an Ising system out to fourth order in the spin-spin 
correlations [35]. We apply our results to re-analyze the 
correlations among neurons in the vertebrate retina |llj . 

Our primary conclusions are negative: real networks 
of neurons in the retina are outside the regime in which 
we can expect the leading orders of perturbation theory 
to capture the impact of the measured correlations, and 
we argue that this is true more generally for biological 
networks. But even this negative result is important, be- 
cause it shows us that the successes of maximum entropy 
models thus far are not simply the result of correlations 
being weak, so that even in groups of 20 of 40 neurons 
we are seeing meaningful hints of the emergent, collec- 
tive behavior predicted by these models. The perturba- 
tion expansion also highlights the difficulties of defining 
a thermodynamic limit for these systems. 



II. CALCULATION OF THE ENTROPY 

We are interested in a system of ./V spins {<7i}, which 
represent the states of a biological network. As noted 
above, this representation is especially simple for neu- 
rons, where a\ = +1 marks the occurrence of an action 
potential from neuron i in a small time window, while 
(jj = — 1 indicates that neuron i is silent. Although the 
problem is quite general, when we want to speak con- 
cretely we will use such networks of neurons as our ex- 



ample. 

Once the number of elements N in our system is large, 
no reasonable experiment can lead directly to an estimate 
of the full probability distribution P({erj}) describing the 
states of the system as a whole. What we can hope to 
measure are expectation values for low-order operators, 
such as the mean magnetization of each spin, 



and the spin-spin correlation, 



{<*} 



(1) 



(2) 



In the maximum entropy formulation, one constructs 
a distribution which maximizes the entropy S[P({ai})], 
subject to constraints. The entropy is defined as usual 
by 



S[P(W)] ^-^TP^}) In [P({ CTi })] 



(3) 



where we measure entropy in nats unless stated other- 
wise. One can in principle write down a maximum en- 
tropy distribution consistent with higher order correla- 
tions, but if we keep just the one-point and two-point 
expectation values then the solution to the resulting con- 
strained maximization problem is 

p{{ai}) = mhti) exp (? hi<Ji + s § Jii<Ti<T ) 



(4) 



(5) 



where Z{{h- 11 Jij}), the partition function, is given by 

z ({h, Jij}) = ex P \ Y1 hi<Ji + \Y1 Ji j CTiCr j 

The numbers {hi, Jij} are Lagrange multipliers which are 
fixed by imposing the constraints in Eq's ([I]) and ^ . One 
immediately recognizes Eq Q as the Ising model where 
the interactions { Jy} exist (potentially) between all pairs 
of spins. 

The difficulty in computing the entropy 5[P({(Ji})] di- 
rectly from the distribution in Eq Q is that of imposing 
the constraints in Eq's ([!]) and This corresponds to 
solving the N(N + l)/2 simultaneous equations 



(O-iO-j 



dhi 

d\nZ{{h, Jjj}) 
dJij 



(6) 
(7) 



Our goal is to develop a perturbative approach to this 
problem. At the risk of being pedantic, we present the 
development in some detail, hopefully making the dis- 
cussion accessible to a broader audience with interests in 
biological networks. 
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A. The General Case 

Let's start with a very general approach, in which we 
imagine that we know the expectation values for some 
set of operators 0^({<7i}), \x = 1,2, ■■■,K. Then the 
partition function for the maximum entropy distribution 
takes the form 



^({.U) = E ex p 



K 



5>A(M) 



.,1=1 



(8) 



The {g,} represent the coupling constants of the system, 
which arise as Lagrange multipliers in the constrained 
maximization problem from which this partition function 
originated. The coupling constants are determined by the 
K simultaneous equations 



<<VW)> = 



dg^ 



(9) 



We assume that there is some 'zero order' condition in 
which the expectation values are (0^({o"i}))^ ' and the 
corresponding coupling constants are g° . If we observe 
that expectation values are slightly different from their 
zero order values, this should have a proportionally small 
effect on the entropy, and this is what we want to calcu- 
late in perturbation theory. 

As is usual in statistical mechanics, we can relate the 
entropy to derivatives of the free energy, 



^})=ln^})-E^^S M - 



(10) 



Note that, in this view, entropy is a function of the cou- 
pling constants, and only implicitly a function of the mea- 
sured expectation values. To make the dependence on 
expectation values explicit, we consider 



dS 



E 



dS dg v 



E 



= -E^ 

A 

= -9n- 



E^ 
E 



d(6x) 
dg v 

d(O x ) 



dg„ 
9(6,) 

dgv 



dgv 0(6,) 



(ii) 

(12) 

(13) 
(14) 



To use this expression we should view the coupling 
constants as functions of the expectation values, g, — 



9n({(0\})- In the zero order state we have 

9ll = g° = gM(O x )^}), (15) 

and we measure the deviations from this state as 

%=.9p-.9°- (16) 
Similarly, we define the deviation of the operators from 
their zeroth order values to be 



AO, = 0„- (O m ) 



(0) 



(17) 



Then the entropy in the state we are interested can be 
found by integrating, starting from the zero order state: 

s = s({<6 a >(°)})-E<#<Acg 



a J {0} 



{(AO,,)} 



d(AO a )Sg a , 



(18) 



where S({(O a )^}) is the entropy of the zeroth order 
distribution. 

Now our task is clear — we need to develop a perturba- 
tion theory for the coupling constants themselves. The 
zeroth order couplings define expectation values with re- 
spect to the induced zeroth order distribution in the usual 

way, 



(0) 



Z ({gl}) ee ^exp 
{<*} 



(...) 

(19) 
(20) 



Using these definitions, one can rewrite the partition 
function as 



Z({g,}) = Z ({«#})exp E^W 



x ( exp 



E % A ^ 



(o) 



(0) 



(21) 



where for convenience we drop the explicit dependence 
of our operators on the binary variables. From this ex- 
pression we use the cumulant expansion to develop In Z 
as a power series in Sg, and then differentiate to obtain 
the expectation values. The result is that 



(0 M ) = ((3„>< > + (AO,AO„)< '^ + ^(Ad,Ad v Ad x )^5gJg x + ^(A6,Ad v Ad x Ad p )^5g v 5g x 5g p + ■■■, (22) 
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where we sum over repeated indices for the remainder of this section. We can rewrite this as 
S 9a = (x- 1 W(AO M ) - ^(x-^^SgJg^AO^AdvAOx)^ - ^{x^^gJg^g^Ad^Ad.AdxAd^ +■■■, 



where we identify the susceptibility 



Iterating this series expansion, we find 



(23) 
(24) 



Sg a = (x- x )ap(A(5 M ) - i(x- 1 ) QA1 (x- 1 )^(x~ 1 )7A(A6,)(AO A )(AO AI A6 /J A6 7 )( ) 

+i(x~ 1 W(x~V(x~V(x~%(x~V(A(%)(A(^ 



(25) 



Having obtained an expression for the couplings perturbatively, we use the coupling constant integration [Eq ([18 
to generate an expansion for the entropy. The result is 

S = 5({(6 ct ) })-.9°(AO M )-i( x - 1 ) QAl (Ad Q )(A^ l ) 

+l(x-%^x- 1 )pAx-%x(Ad a )(Ad u )(Ad x )(Ad^Ad p Ad^ 

- 1 (x _1 WOrVOrVOr'Mx -1 W (Ad a ) (Ad.) (ao p ) (ao ct ) {ao, ao p ao 7 ) (°) (ao x ao s Ad e 



(0) 



+^(x~ 1 W(x~V(x~V(x~%(A6 Q )(Ad^ 



(26) 



These results allow us to express the entropy — or, more 
precisely, the maximum possible entropy — as a function 
of experimentally observable expectation values, assum- 
ing that these are close to some reference state which we 
understand exactly. 



B. The Pairwise Maximum Entropy Model 

In the pairwise maximum entropy model one assumes 
that the operators O m take on two distinct expressions 
depending on their index. In the first sector, fi = i => 
O p = (7; and in the second /J, = ij =>■ O p = crjOj. The gen- 
eral partition function of the last section [Eq ^] then 
reduces to the partition function of the Ising model in 
Eq ((5). One can write down the entropy for this lat- 
ter partition function in perturbation theory in terms of 
empirical quantities, namely, in terms of the one- and 
two-point correlation functions {(ci)} and {(ciCj)} re- 
spectively. The final form of this entropy is that of our 
earlier result, Eq (26 1. Here, we rewrite Eq (j2Gh in terms 



of quantities defined by the pairwise maximum entropy 
model, leaving the details of the calculation to the ap- 
pendix. 

We begin with a form for the partition function which 
re-expresses the operators appearing in the pairwise con- 
struction such that their expectation values vanish in the 
zeroth order distribution. Namely we consider the follow- 
ing variant of Eq ([5| , 



Z{{hi, Jij}) = ^ ex P X! hiS<Ti + X! J ij Sa i S<7 3 
Wi} \ i i<j 

' (27) 

where 5<t\ = er; — (ci)' '. The zero order coupling 
constants {g p } correspond to a noninteracting model, 
Jij = 0, with the {h°} chosen to reproduce the observed 
mean magnetizations, 

(<n) = <CTi)(°> =tanh(/if). (28) 

One can rewrite the partition function in Eq (|27| as 



Z({h i ,J ii }) = Z ({h?})exv 



\(o) 



exp ShiSdi + Jjj<5<7ii5aj 



(0) 



(29) 



KJ 
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where 



{ CT ,} V i > 



(30) 



As usual, zero order expectation values are defined by 



(0) 



(31) 



Proceeding as in the general case, we obtain the ana- 
logue of Eq (26). The result can depend only on the 
experimental one point correlations \J\<J\)^> = (c;)} and 
the experimental two point correlations, which we sum- 
marize by the correlation coefficients 



(<5(7i<5<Xj 



^(( ( 5 ( 7 i ) 2 >(°V((^) 2 ) (0) 



We write the entropy as 

S({<<7 i )( ),C y })=So({(a i )(°)})+A5({<<7 i )W,a j }), 

(33) 

where 

1 N 

S = N + — ^ln^cosh^tanh-^^i)^) 
i=i 

1 N 

- i ^^(a i )(°)tanh- 1 ((a i )(°)) bits (34) 



is the entropy of the noninteracting system, and then 
collect terms with successive powers of the Cy: 



AS = AS 2 + AS 



3 1 : 



(35) 



(32) where 



AS 2 ({Gj}) = ~^E^ 2 bits 



AS, 



m j#i i#i 



3 In 2 



,(0) 



M 

rms 



\(0) 



V J / rms . 



bits. 



(36) 
(37) 



Details of the computation, including the fourth order 
term AS4, are collected in the Appendices. 



C. Remarks on the thermodynamic limit 

Many biological networks are large, and it is tempting 
to think that the essence of their behavior can be derived 
in the thermodynamic limit, N — > 00. We expect that, 
in this limit, the entropy is extensive, that is 5* oc N. But 
we have to be careful about how we define this limit, and 
what is held fixed as N varies. 

To illustrate the problem, consider the entropy to sec- 
ond order in the correlations (in bits), 



S « So 



1 

4 In 2 



As iV becomes large, we can write this as 



S ~ Sq 



N(N-l) 2 
41n2 (C } 



(38) 



(39) 



where (C 2 ) denotes the average squared correlation co- 
efficient in the network. It is the entropy per spin which 
should be finite as TV — > 00, 



5^ So _ N(C 2 ) 
N TV ~ 41n2 ' 



(40) 



To enforce the existence of a thermodynamic limit, it is 
tempting to say that we must have N(C 2 ) be finite as 
N grows large. The difficulty is that (C 2 ) is an exper- 
imental quantity, not something we are free to adjust 
theoretically. 

We recall that, if we are studying a large system with a 
well defined geometry and a finite correlation length, then 
we expect correlations to decay with the distance between 
spins. Roughly speaking, in dimension d we expect that 
of the ~ TV 2 pairs of spins that we could choose, only 
~ N(^/a) d have significant correlations, where a is the 
lattice spacing or typical distance between spins. Thus 
the mean square correlation will scale as 



(C 2 



r2 m/g) d 

C ° N 2 



N 



(41) 



where Co is the correlation between nearby spins. In 
this scenario, N(C 2 ) indeed is finite at large N, and the 
entropy is extensive, as it should be. 

Because neurons connect through structures (axons 
and dendrites) that can be much longer than the spacing 
between neurons, many interesting biological networks or 
sub-networks do not have a clear notion of geometry or 
locality. The result is that correlations need not have a 
systematic dependence on the distance between cells, and 
so the system is more nearly mean-field-like. In a truly 
mean-field system, all Cy would be drawn from the same 
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distribution, and to enforce extensivity would require this 
distribution to have (C 2 ) oc 1/N. But, again, (C 2 ) is an 
experimentally accessible quantity. 

If we have a system of N neurons with connections 
that reach across the entire network, it is plausible that 
recording from two neurons at random we will measure a 
correlation coefficient that is independent of the distance 
between cells and represents a sample from the overall 
distribution P(C). In the salamander retina, for exam- 
ple, there is no systematic relationship between correla- 
tions and distance as long as we stay within a radius of 
~ 200 /im, and within such a correlated patch there are 
N ~ 200 cells [S3]- In such networks, (C 2 ) is a number 
we can measure by sampling many pairs of cells, even if 
we can never record from all N cells simultaneously. 

If we imagine networks with different values of N but 
the same value of (C 2 ), corresponding to what we mea- 
sure in a real network, this family of hypothetical net- 
works will have an entropy per spin that varies with N, 
even at large N. In this sense, there is no simple thermo- 
dynamic limit. We can think about increasing N at fixed 
(C 2 ) as being like changing temperature, as in the quali- 
tative discussion of Ref [IT] , or we can try to estimate the 
actual value of N(C 2 ) in the real system, and imagine a 
system in which N — > oo but N(C 2 ) is fixed to its exper- 
imental value. A key point, which will be reinforced by 
the more detailed calculations below, is that N(C 2 ) can 
be large even when all Cy are small, so that the impact 
of correlations on the entropy per spin depends on the 
size of the system. 



III. RESULTS FOR A NETWORK OF REAL 
NEURONS 

Interest in maximum entropy approaches to real bio- 
logical networks was stimulated by Ref [11] . which ana- 
lyzed the responses of neurons in a small patch of the 
salamander retina as they responded to a naturalistic 
movie. The experiment used an array of electrodes to 
record from forty neurons within a radius of ~ 200 /im, 
a region throughout which there is no systematic depen- 
dence of correlations on distance, as described above. It 
is reasonable, then, to view this experiment as a sample 
from the ~ 200 neurons in the patch. In this sample, the 
distribution of correlation coefficients is peaked near zero, 
with almost all the weight at C < 0.1; a substantial frac- 
tion of correlation coefficients are negative, and the ex- 
periment is long enough that the threshold for statistical 
significance is |C| ~ 0.001. These weak pairwise correla- 
tions coexist with signatures of interesting collective be- 
havior, such as a long tail in the probability that K of the 
N cells spike simultaneously, and dramatic discrepancies 
between the probability of different 10-cell patterns of 
response and the probabilities predicted if each cell were 
acting independently These discrepancies are resolved in 
the maximum entropy model. This detailed analysis of 
patterns of response in small sub-networks was extended 
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FIG. 1: Entropy vs the strength of correlations for 20 
cells. As explained in the text, we consider a popula- 
tion of neurons with measured mean spike rates {(ffi)} and 
correlation coefficients scaled by a factor F, dj — > Fdj. 
'2nd order' refers to the entropy to second order in per- 
turbation theory i.e., S({(cn) m , FCy}) = So({(oi) (0) }) + 
AS2({-FCij}) and similarly for the other orders. We note that 
at F ~ 0.5, AS :i ({(ai) m , FCi)}) is roughly the same size as 
AS2({i ? Cij}) — perturbation theory is breaking down. Corre- 
lations for which F > 0.5 can thus be considered to be large. 



to groups of 20 and 40 cells [HI [T3] , showing for example 
that three-point correlations are well predicted from the 
maximum entropy model that incorporates only pairwise 
interactions. These successes invite extrapolation to the 
behavior of larger groups of cells, where collective effects 
are predicted to be even more dramatic [TTJ [12j [14] . 

Here we are interested in reanalyzing the data of Ref 
[11) using our perturbation theory for the entropy. We 
lean on the results of Ref |12j . where numerical meth- 
ods were used to construct the pairwise maximum en- 
tropy models for groups of N = 20 neurons (exactly) and 
N = 40 neurons (approximately, matching the measured 
Cy within ~ 1%). These results give us essentially ex- 
act answers for the (maximum) entropy in these groups 
of cells, against which our perturbative results can be 
compared. 

We would also like to have an internal standard for the 
validity of perturbation theory. As usual, we can obtain 
such a criterion by asking whether successive orders of 
perturbation theory provide progressively smaller correc- 
tions, in this case to the entropy. To gain control of the 
calculation, we imagine a population of neurons in which 
all correlation coefficients have been scaled by a factor F, 
Cy — > -PCiji but the mean spike rates (the expectation 
values (<7i)) are held fixed. Certainly as F — > pertur- 
bation theory should work, and as we increase F — > 1 
we approach the real system. In Fig [l] we present the 
entropy as a function of F for a group of N = 20 cells, 
as calculated in different orders of perturbation theory, 
comparing the exact results [34j . 



FIG. 2: Entropy vs F for (left to right) 15, 10 and 5 cells. These groups are nested subsets of the 20 cells used in Figure[T] 



We see from Fig [I] that, at F = 1, the third and fourth 
order contributions to the entropy overcorrect the sec- 
ond order approximation. The perturbative formalism 
at this scale of the correlations lies outside its range of 
validity. Scaling F — > 0, we see gross agreement be- 
tween the perturbative results and the numerical result 
with convergence at roughly F = 0.3. Comparing only 
successive contributions to the deviation of the entropy 
from the independent entropy we note that the magni- 
tude of the third order correction |A5 3 ({((Ti) (0) ,FCij})| 
is roughly the same as the magnitude of the second order 
correction \AS 2 {{FCij})\ at F ~ 0.5. Qualitatively then, 
it is at these values of the correlations that the pertur- 
bative formalism for TV = 20 cells is breaking down and 
thus for F > 0.5, the correlations are effectively strong. 

As discussed in relation to the thermodynamic limit, 
our perturbation series mixes a dependence on the cor- 
relations themselves with a dependence on the size of 
the system. If the scale of correlations is held fixed (for 
example, at the experimentally observed values!), then 
convergence of the series depends upon N. At smaller 
N, we expect that the perturbative approach will work 
for larger values of the correlations. To see this, we ex- 
plicitly consider subsets of 15, 10, and 5 cells out the 20 
we have analyzed so far. For these different values of N 
we again trace the perturbative predictions for the en- 



tropy as a function of F, which scales the correlations 
relative to their experimental values; results are shown 
in Fig [2] 

For 15 and 10 cells, we do not see convergent be- 
haviour of the series, and again the fourth order con- 
tribution AS , 4({(cTi)(°\ FCjj}) significantly compensates 
for the third order contribution A5 , 3 ({(cr i ) (0) , FCy}). For 
5 cells, the series seems to be displaying convergent 
behaviour, with each successive correction representing 
some fraction of the last one. Also, in this case the se- 
ries makes the sensible prediction that the entropy of the 
correlated system is smaller than when the correlations 
are zero; even this basic fact seems outside the reach of 
perturbation theory at larger N. 

Although it might be interesting to know the particu- 
lar answer for the entropy in specific groups of neurons, 
we are more interested in the overall validity of our per- 
turbative approach. As suggested above, we can think 
of any N cells we study as being drawn out of a larger 
population, and in this population we can compute av- 
erages of the correlations in the combinations that enter 
the series for the entropy; our first example above was in 



Eq (39), and we can do this for every term in the series. 



Up to third order, this yields 



S(N,F) = N (l + ^<ln (cosh (tanir 1 ^ ))))) - ^({^) (0) tanh- 1 ((a i )(°))) 



' N(N-l)F 2 (af) i9t j 



31n2 
1 

"3! In 2 



N(N-l)(N-2)F 3 (C ii C il C li ) i ^ Wi bits (42) 



where again, the averages (. . .) above are taken empiri- Two conditions will guide us in constructing a sensible 

cally. regime of validity for the series. The first is that, as with 
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FIG. 3: Regime of validity (green) for the perturbative entropy S(N, F) expressed as a function of the number of cells N and 
the correlation scale factor F. At left, results to third order in perturbation theory. The green region corresponds to that 
part of our configuration space where the magnitude of the third order correction is less than 90% of the magnitude of the 
second order correction (| ASs(N, F)\ < 0.9\AS2(N, F)\) and the total correlated entropy is less than the independent entropy. 
We have not included the fourth order term AS^iV, F) in our considerations here. At right, green region corresponds to 
that part of our configuration space where the magnitude of the fourth order correction is less than 90% of the magnitude 
of the third order correction which is in turn less than 90% of the magnitude of the second order correction (\AS4(N, F)\ < 
0.9\AS 3 (N,F)\ , \AS 3 {N,F)\ < 0.9| AS 2 (N, F)\) and the total correlated entropy is less than the independent entropy. 



any perturbative series, successive corrections in the se- 
ries must be less than some fraction of the previous order 
correction. We will use, for the sake of being concrete, a 
figure of 90%. By this measure, the perturbative series 
will be said to have convergent behaviour at some com- 
bination of N and F if the magnitude of the k'th order 
correction \ASk(N, F)\ is less than 90% of the magni- 
tude of the (k — l)'th order correction \ASk-i(N, F)\ for 
all k > 3 included in the construction of the space. We 
also insist that a valid perturbation theory must predict 
the entropy of the correlated system to be smaller than 
that of the uncorrelated (F — 0) system, order by order. 
With these criteria, we outline the regions of validity for 
perturbation theory in Fig [3j We emphasize that these 
results are a combination of theory with the empirical val- 
ues for different moments of the correlation coefficients 
in the network of retinal neurons [11] . 



IV. DISCUSSION 

Most of what has been learned about the function of 
the brain, as well as other biological networks, has been 
learned by studying the activity of individual elements 
the spikes generated by single neurons, the expression 
levels of single genes, the concentrations of particular 
metabolites, and so on. Our intuition from statistical 
mechanics is that these large networks should have in- 
teresting collective behaviors. A first step in searching 
for collective effects is to look for correlations between 



elements, and this has been explored in a wide variety of 
experiments; in the case of neural networks, this effort 
dates back roughly forty years [35J 135] . 

It commonly is observed that correlations among neu- 
rons are weak but widespread. Thus, almost all pairs 
of cells that plausibly are involved in the same neural 
functions have statistically significant, but small, cor- 
relations; examples include the retinal neurons consid- 
ered here [TT], as well as in cerebral cortex [37]. If we 
ask about the implications of these weak correlations for 
the function of pairs of neurons, the answer must be 
that the effects are proportionally small. But because 
the correlations are widespread, it is possible that the 
~ N 2 correlated pairs add up to provide a signature of 
a qualitatively important collective effect. Recent work 
has made this idea explicit, using the maximum entropy 
method to map data on the pattern of pairwise corre- 
lations into a statistical mechanics model of the whole 
network [TT1H3QJ]. 

Although the maximum entropy approach to describ- 
ing networks of neurons has had some success, it could 
be that these successes are not probing a regime in which 
collective effects are possible. In particular, if correla- 
tions are weak enough, one can imagine that a mini- 
malist (i.e., maximum entropy) account of their impact 
succeeds, but for the trivial reason that all effects are 
minimal; this pessimistic claim has been made explicitly 
38 . In this setting, pessimism about what we arc learn- 
ing from maximum entropy analyses of real neural data 
is equivalent to optimism about the utility of perturba- 
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tion theory. Perhaps amusingly, optimism about the de- 
tectability of collective behavior favored by physics-style 
models is equivalent to pessimism about the utility of one 
the physicists' favorite tools, perturbation theory. 

Our main technical result is the development of a per- 
turbation series that relates the maximum entropy to the 
observed pattern of pairwise correlations. To use this re- 
sult, we imagine a population of N neurons in which the 
distribution of mean spike rates is what one observes ex- 
perimentally, and the distribution of correlations is as 
observed but scaled uniformly by a factor F. Then we 
can study the entropy as a function of N and F. We 
note that the real system corresponds to F = 1, and 
maximum entropy analyses have been pushed to N = 40 
using real data [THIH]. Figure [3] shows, unambiguously, 
that this is outside the regime in which we can expect 
the low orders of perturbation theory to give reliable an- 
swers. Conversely, this means that the successes of the 
maximum entropy approach provide hints of interesting 
collective behavior, which is consistent with the obser- 
vation of multiple locally stable states and an incipient 
critical point [TU [TJ] ; for more on criticality in biological 
networks, see Ref [55] , 

More generally, as we look at larger networks — perhaps 
the ~ 10 2 of transcription factors controlling gene expres- 
sion in a single celled organism, or the tens of thousands 
of cells in small patch of visual cortex — the maximum 
correlations that can be captured in low orders of per- 
turbation theory become smaller and smaller. For the 
small patch of retina we have been discussing, the rele- 
vant N ~ 200, where the validity of perturbation theory 
is limited to F < 0.1, corresponding to correlations ten 
times smaller than what is seen experimentally. 

Much of the work on correlations in biological networks 
is focused on the more limited question of whether a 
particular element in the correlation matrix is statisti- 
cally significant. Roughly speaking, if we make K in- 
dependent measurements, we expect that the threshold 
for statistical significance scales as C* oc 1/yK. On the 



other hand, depending on the pattern of correlations, the 
threshold for breakdown of perturbation theory can scale 
as C s oc N or C s oc y/N. If we are in the limit where 
C s < C* , we have a serious problem: even 'insignificant' 
correlations could be so large that they lead to a break 
down of perturbation theory. Put another way, in this 
limit the statistical power of the experiments is so poor 
that it can fail to detect even the signatures of collective 
behavior in the network, let alone more subtle patterns 
of truly weak correlation. This means that the number 
of measurements we need to make to provide a meaning- 
ful characterization of the correlations between two ele- 
ments grows with the size of the network in which these 
elements are embedded. Note that this is in contrast to 
the usual statistical intuition, and provides a sobering 
message for experimentalists. 

To summarize, the real patterns of pairwise correla- 
tions in biological networks — certainly the network of 
neurons that we consider in detail — fall outside the 
regime in which the low orders of perturbation theory 
can capture their impact on the states of the network as 
a whole. This is bad news for actually solving the inverse 
problem, but good news in that it means the successes of 
maximum entropy approaches to these networks are not 
simply the result of correlations being weak. 
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Appendix A: Details of the expansion 



The pairwise maximum entropy partition function is given by Eq (29), which we restate here for reference: 



Z({h u Jy}) - Z ({h?}) exp ^- ^(^i> (0) ^ ^ exp ( £ ShM + £ J^a-^ 



(o) 



(Al) 



We make the following identifications to align the form of our partition function [Eq \kl\ ] with that of the generic 
case discussed earlier: 



9a =9° a + 5g a 



AO a = 



+ Ju o^ij E Q ->Ei<j 

<5<Ti a — > i 

ScriScjj a ij (i < j) 
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We obtain an expression which deviates slightly from that of the corresponding equation for the generic case in Eq 



Z({9,}) = Z ({gl})exp f-£>Vi)< ^ ^exp ^> a AO^ 



(A2) 



Utilizing the definition of the cumulant expansion and noting that here, (A6 Q )i 0) = (AO a ) (0) = for all a, we find 
that 

Z({g,}) = ^o({3°})expf-^/ l °<a i }(°^ 

x ex P U E 5g»5gAAd,Ad„)^ + I £ Sg^g v Sg x (AO M A(%A6 A }( > + • •• j (A3) 

where again (. . .)£ represents the cumulant of the enclosed operator with respect to the zeroth order distribution. 
One can show that equations Eq (|23|) to ( 25 ) hold now for the pairwise maximum entropy distribution as well. However 



in this instance we can elaborate on the final generic form for the perturbed couplings Eq (25), as one can explicitly 
calculate the susceptibility [Eq (24)] as follows, 



(o) 



(A4) 



The indices fj, and v can take on forms i or ij giving us three unique combinations of indices for the quantity Xnv A 
short calculation shows that only cases where the same form of index appears in x^v does one obtain a nonzero value. 
Therefore one has that 



(A5) 



with no sum over the repeated index [on the right hand side of Eq (A5)], where is the Kronecker delta function 



and 



i(0) 



for (i = i 



<(to) 2 ) (0) <(to) ) (0) for/i = ij. 



Given these results, the expression for the perturbed couplings in Eq ( 25 ) becomes 

1 1 (Agg) (AQ 7 ) (AOs) 
3!/(m) f(P) /(7) f(S) 

1_1 1 (Agg) (AQ g ) (AOr) 

2/( M )/(7) /(/?) f(<j) f(r) 

where we sum over repeated indices and do not sum over indices in the argument of our function /(• • •). 
We can reconstruct Equation ( 18 1 for the entropy in terms of the couplings by starting with 



(AO^AO^AC^AO^ ' 

(AO p A6^AO 7 )( )(Ad 7 A(%Ad T )( > + • • • . 



(A6) 



S({g^}) = \nZ({g^})-g^ 



d\nZ({ 9fl }) 



d(Sg^) 

Taking derivatives with respect to the perturbed couplings one finds that 



dS 



g a - 



d(AO a ) 



d(5g p ) ^d(Sg f3 ) 



which implies a relation for the entropy as a function of the couplings analogous to Eq ( 14 ) , 

dS 



d(AO a ) 



= -g a ({(AO J} 



(A7) 



(A8) 



(A9) 
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Integrating this equation from the set of zeroth order expectation values (the uncorrelated network) up to the exper- 
imental values of the correlations, one finds that 



S = So - g° a (Ad a ) 



{<ao q )} 



{<)} 



d(AO a )6g a ({<A<5„>}) 



(A10) 



where So is the entropy computed from the zeroth order partition function; namely, the entropy for the nonintcracting 
or independent system. Again, the values of the one-point correlations {(fx;)} which one gleans from experiment are 
those that result in the no interaction limit, that is for i = 1, . . . , N, 



7i) (0) =tanh(/if). 



(All) 



This assumption forces (AO a ) = for a = i and notably fixes the N zeroth order couplings in the problem. Combining 
Eq (A10) for the entropy, and Eq (A6) for the perturbed couplings, one obtains 



S = S + AS 

where for an array of N cells, the independent entropy So, is given by 



N 



So = Nbx2 + ^ln (cosh ( tann" 1 ^) 



(0)' 



tanh-^K 



(oh 



i=l i=l 

and the deviation of the entropy from this independent result is given by 
1 (AO a )(A6 a ) 



AS 



2 f(a) 

1 (AO Q ) (AO,) (AO T ) . . ^ (0) 



1 (AOg) (AOp) (AOj) (AOs) 
4.3! f(a) f(j3) /( 7 ) f(S) 

1 1 (AO a ) (AO p ) (AO,) (AOs) 



(AO ' a AO AO ' 7 AOs) 



(0) 



(AO a AOp AO A )(°) (AO A Ad 7 AO, 



4.2 /(A) f(a) /(/?) /( 7 ) f(S) 
For notational simplicity we will rewrite Eq ( |A14[ ), splitting it into its component parts as follows: 

AS = AS 2 + AS 3 + AS 4 + ... 



where 



AS 2 
AS 3 
AS 4 



1 (AO a )(AO a ) 

2 /(«) 

1 (AO a ) (AOp) (AQ 7 
3.2! f(a) f(/3) /( 7 ) 



1 (AOg) (AOp) (AO, ( ) (AO S ) /a A a A a A a A .(„) 

1 1 (AO a ) (AOp) (AO,) (AOs) / a a a a A a uo),aA ,A AO \<°) 



(A12) 



(A13) 



(A14) 

(A15) 

(A16) 
(A17) 

(A18) 



are the second, third and fourth order contributions to the total deviation of the entropy from its independent value 
respectively. 

To complete the calculation, one can explicitly compute the contribution of each of the terms in Eq (A14| to the 
deviation of the entropy from its independent value in Eq (A13) . For the sake of completeness we will outline this 
computation for the contribution of the third order term AS 3 to AS and then simply state the result up to fourth 
order. 
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Appendix B: The third order contribution 

Consider the third order term AS3 (in nats), 

1 (AO a ) (Agg) (AQ 7 ) . . . (0) 



(Bl) 



By assumption, (AO Q ) = for a — i so the only terms which contribute to the sum above are those for which a — ij. 
Note also that 

(AO Q AO,gAd 7 )( 0) = (A6 a AO^A6 7 } (0) + (terms that vanish). (B2) 
In the only nonzero sector of the above sum, we have, restoring the sums explicitly in our expression, 
1 1 \— s ,„„,,,. * , ((5(7i(5o'j(5(Tk(5(Tii5cr m <5(T n )^ ' ) 



A5 3 = 



3! 2 3 



2J ((ScTi^CTj) (5tT k 5(Ti) ((5cr m 5tT n ) 



i^j m^n 



((SaO 2 ) W ((<Wj) 2 ) C°) ((5a k ) 2 > (°> ((M ) <°> ((<^ m ) 2 ) ^ ((<^ n ) 2 ) (°) 



(B3) 



The correlation function 



vanishes if the individual deviations 6a, are left unpaired in the sum in which they reside. This means that one has 
two distinct contributions to the sum, one where the deviations da, are paired off (e.g., i = k, j = m, 1 = n) and one 
where three indices are equal (e.g., i = k = m and j = 1 = n). The symmetry properties of the sum under investigation 
dictate that there are eight identical contributions to AS3 from the former and four identical contributions to AS3 
from the latter. Thus we obtain 



AS 3 



- r 



(6ai5(Tj}(5criS(Ti) (6<jj6a\) 



3! 



E 



m ((<5a i ) 2 )(°)((^ j ) 2 )(o)(( ( 5a 1 ) 2 )( ) 2.3! ^ ((^.^(o) 3 / 2 ^.)^) 



,3/2 



Writing this in terms of the two point correlations {Cy} introduced in Eq (32), we obtain the following expression for 
the third order contribution to the entropy 



A5 3 = ASatfW ,^}) 



¥j JVI i#l 



(5a,) 



(0) 
rms 



[6a-) 



)(0) 
J ^ rms . 



(B4) 



where (6a^± = J{{6*i) 2 )M. 
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Appendix C: The fourth order contribution 



One can employ a similar technique to that of the last subsection in computing the remaining terms in the con- 
struction of the entropy. Here we will simply state the final result for the entropy to fourth order in the correlation 
coefficients {Cy}: 

s = sawwefc}) 



N 



N N 

L^l n ( C0S h(tanh-HWW))) - ^^Wtanh- 1 ^) 



In 2 



i=l 



41n2 



-3^2 E ^ii^ + g^aE^ 

Mj j^l i#l ¥J 



,(0) 



241n2^ Qj 

Mi 

L^2 ^ E ^ 



(0) 
rms 



\(0) 



(0) 

111S 



1 + 9( CTi )(°) 2 - 3(aj)(°) 2 + Q^ ' Vj) (0)2 



Mj Mn j^n 



8 In 2 



E 



CijCjnCmCn + 0(C 5 ) bits. 



m n^l l#i i^nj^l 



(CI) 
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